{
 "cells": [
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "# Simple Fusion Retriever\n",
    "\n",
    "In this example, we walk through how you can combine retrieval results from multiple queries and multiple indexes. \n",
    "\n",
    "The retrieved nodes will be returned as the top-k across all queries and indexes, as well as handling de-duplication of any nodes."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "import os\n",
    "import openai\n",
    "\n",
    "os.environ[\"OPENAI_API_KEY\"] = \"sk-...\"\n",
    "openai.api_key = os.environ[\"OPENAI_API_KEY\"]"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Setup\n",
    "\n",
    "For this notebook, we will use two very similar pages of our documentation, each stored in a separaete index."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "from llama_index.core import SimpleDirectoryReader\n",
    "\n",
    "documents_1 = SimpleDirectoryReader(\n",
    "    input_files=[\"../../community/integrations/vector_stores.md\"]\n",
    ").load_data()\n",
    "documents_2 = SimpleDirectoryReader(\n",
    "    input_files=[\"../../module_guides/storing/vector_stores.md\"]\n",
    ").load_data()"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "from llama_index.core import VectorStoreIndex\n",
    "\n",
    "index_1 = VectorStoreIndex.from_documents(documents_1)\n",
    "index_2 = VectorStoreIndex.from_documents(documents_2)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Fuse the Indexes!\n",
    "\n",
    "In this step, we fuse our indexes into a single retriever. This retriever will also generate augment our query by generating extra queries related to the original question, and aggregate the results.\n",
    "\n",
    "This setup will query 4 times, once with your original query, and generate 3 more queries.\n",
    "\n",
    "By default, it uses the following prompt to generate extra queries:\n",
    "\n",
    "```python\n",
    "QUERY_GEN_PROMPT = (\n",
    "    \"You are a helpful assistant that generates multiple search queries based on a \"\n",
    "    \"single input query. Generate {num_queries} search queries, one on each line, \"\n",
    "    \"related to the following input query:\\n\"\n",
    "    \"Query: {query}\\n\"\n",
    "    \"Queries:\\n\"\n",
    ")\n",
    "```"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "from llama_index.core.retrievers import QueryFusionRetriever\n",
    "\n",
    "retriever = QueryFusionRetriever(\n",
    "    [index_1.as_retriever(), index_2.as_retriever()],\n",
    "    similarity_top_k=2,\n",
    "    num_queries=4,  # set this to 1 to disable query generation\n",
    "    use_async=True,\n",
    "    verbose=True,\n",
    "    # query_gen_prompt=\"...\",  # we could override the query generation prompt here\n",
    ")"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "# apply nested async to run in a notebook\n",
    "import nest_asyncio\n",
    "\n",
    "nest_asyncio.apply()"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "Generated queries:\n",
      "1. What are the steps to set up a chroma vector store?\n",
      "2. Best practices for configuring a chroma vector store\n",
      "3. Troubleshooting common issues when setting up a chroma vector store\n"
     ]
    }
   ],
   "source": [
    "nodes_with_scores = retriever.retrieve(\"How do I setup a chroma vector store?\")"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "Score: 0.78 - # Vector Stores\n",
      "\n",
      "Vector stores contain embedding vectors of ingested document chunks\n",
      "(and sometimes ...\n",
      "Score: 0.78 - # Using Vector Stores\n",
      "\n",
      "LlamaIndex offers multiple integration points with vector stores / vector dat...\n"
     ]
    }
   ],
   "source": [
    "for node in nodes_with_scores:\n",
    "    print(f\"Score: {node.score:.2f} - {node.text[:100]}...\")"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Use in a Query Engine!\n",
    "\n",
    "Now, we can plug our retriever into a query engine to synthesize natural language responses."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "from llama_index.core.query_engine import RetrieverQueryEngine\n",
    "\n",
    "query_engine = RetrieverQueryEngine.from_args(retriever)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "Generated queries:\n",
      "1. How to set up a chroma vector store?\n",
      "2. Step-by-step guide for creating a chroma vector store.\n",
      "3. Examples of chroma vector store setups and configurations.\n"
     ]
    }
   ],
   "source": [
    "response = query_engine.query(\n",
    "    \"How do I setup a chroma vector store? Can you give an example?\"\n",
    ")"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/markdown": [
       "**`Final Response:`** To set up a Chroma vector store, you need to follow these steps:\n",
       "\n",
       "1. Import the necessary libraries:\n",
       "```python\n",
       "import chromadb\n",
       "from llama_index.vector_stores.chroma import ChromaVectorStore\n",
       "```\n",
       "\n",
       "2. Create a Chroma client:\n",
       "```python\n",
       "chroma_client = chromadb.EphemeralClient()\n",
       "chroma_collection = chroma_client.create_collection(\"quickstart\")\n",
       "```\n",
       "\n",
       "3. Construct the vector store:\n",
       "```python\n",
       "vector_store = ChromaVectorStore(chroma_collection=chroma_collection)\n",
       "```\n",
       "\n",
       "Here's an example of how to set up a Chroma vector store using the above steps:\n",
       "\n",
       "```python\n",
       "import chromadb\n",
       "from llama_index.vector_stores.chroma import ChromaVectorStore\n",
       "\n",
       "# Creating a Chroma client\n",
       "# EphemeralClient operates purely in-memory, PersistentClient will also save to disk\n",
       "chroma_client = chromadb.EphemeralClient()\n",
       "chroma_collection = chroma_client.create_collection(\"quickstart\")\n",
       "\n",
       "# construct vector store\n",
       "vector_store = ChromaVectorStore(chroma_collection=chroma_collection)\n",
       "```\n",
       "\n",
       "This example demonstrates how to create a Chroma client, create a collection named \"quickstart\", and then construct a Chroma vector store using that collection."
      ],
      "text/plain": [
       "<IPython.core.display.Markdown object>"
      ]
     },
     "metadata": {},
     "output_type": "display_data"
    }
   ],
   "source": [
    "from llama_index.core.response.notebook_utils import display_response\n",
    "\n",
    "display_response(response)"
   ]
  }
 ],
 "metadata": {
  "kernelspec": {
   "display_name": "llama_index_v3",
   "language": "python",
   "name": "llama_index_v3"
  },
  "language_info": {
   "codemirror_mode": {
    "name": "ipython",
    "version": 3
   },
   "file_extension": ".py",
   "mimetype": "text/x-python",
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython3"
  }
 },
 "nbformat": 4,
 "nbformat_minor": 4
}
